Computing k Centers over Streaming Data for Small k
نویسندگان
چکیده
In this paper, we consider the k-center problem for streaming points in Rd. More precisely, we consider the single-pass streaming model, where each point in the stream is allowed to be examined only once and a small amount of information can be stored in a device. Since the size of memory is much smaller than the size of the data in the streaming model, it is important to develop an algorithm whose space complexity does not depend on the number of input data. We present an approximation algorithm for k = 2 that guarantees a (2 + ε)-factor using O(d/ε) space and update time in arbitrary dimensions for any metric. We show that our algorithm can be extended to approximate an optimal k-center within factor (2 + ε) for k > 2.
منابع مشابه
Efficient Clustering Algorithms for Out of Core, Distributed and Streaming Data
Clustering has been one of the most widely studied topics in data mining and it is often the first step of data mining process. Today, we are witnessing enormous growth in data volume. Often, data is distributed or it can be in the form of streaming data. Efficient clustering in all these scenario becomes a very challenging problem. Our work is in the context of k-means clustering algorithm. k-...
متن کاملClustering High Dimensional Dynamic Data Streams
We present data streaming algorithms for the kmedian problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1, 2, . . .∆}. Our algorithms use k −2poly(d log ∆) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the cost of...
متن کاملk-Means for Streaming and Distributed Big Sparse Data
We provide the first streaming algorithm for computing a provable approximation to the k-means of sparse Big data. Here, sparse Big Data is a set of n vectors in R, where each vector has O(1) non-zeroes entries, and d ≥ n. E.g., adjacency matrix of a graph, web-links, social network, document-terms, or image-features matrices. Our streaming algorithm stores at most logn · k input points in memo...
متن کاملStreaming k-means approximation
We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: ...
متن کاملIMPACTS AND CHALLENGES OF CLOUD COMPUTING FOR SMALL AND MEDIUM SCALE BUSINESSES IN NIGERIA
Cloud computing technology is providing businesses, be it micro, small, medium, and large scale enterprises with the same level playing grounds. Small and Medium enterprises (SMEs) that have adopted the cloud are taking their businesses to greater heights with the competitive edge that cloud computing offers. The limitations faced by (SMEs) in procuring and maintaining IT infrastructures has be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Geometry Appl.
دوره 24 شماره
صفحات -
تاریخ انتشار 2014